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Abstract Contour tracking in adverse environments 
is a challenging problem due to cluttered background, 
illumination variation, occlusion, and noise, among 
others. This paper presents a robust contour track- 
ing method by contributing to some of the key is- 
sues involved, including (a) a region functional for- 
mulation and its optimization; (b) design of a ro- 
bust and effective feature; and (c) development of 
an integrated tracking algorithm. First, we formulate 
a region functional based on robust Earth Mover's 
distance (EMD) with kernel density for distribution 
modeling, and propose a two-phase method for its 
optimization. In the first phase, letting the candidate 
contour be fixed, we express EMD as the transporta- 
tion problem and solve it by the simplex algorithm. 
Next, using the theory of shape derivative, we make 
a perturbation analysis of the contour around the 
best solution to the transportation problem. This 
leads to a partial differential equation (PDE) that 
governs the contour evolution. Second, we design a 
novel and effective feature for tracking applications. 
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We propose a dimensionality reduction method by 
tensor decomposition, achieving a low-dimensional 
description of SIFT features called Tensor-SIFT for 
characterizing local image region properties. Appli- 
cable to both color and gray- level images, Tensor- 
SIFT is very distinctive, insensitive to illumination 
changes, and noise. Finally, we develop an integrated 
algorithm that combines various techniques of the 
simplex algorithm, narrow-band level set and fast 
marching algorithms. Particularly, we introduce an 
inter-frame initialization method and a stopping cri- 
terion for the termination of PDE iteration. Experi- 
ments in challenging image sequences show that the 
proposed work has promising performance. 

Keywords Contour tracking • Tensor-Decomposition • 
Shape Derivative • Earth Mover's Distance (EMD) • 
Kernel Density. 



1 Introduction 

Contour tracking [3[ has been an active research 
topic thanks to its capability to follow the deformable 
boundaries. It has widespread applications in various 
fields such as medical image processing, intelligent 
surveillance and human-machine interaction. The pi- 
oneering work of contour tracking can be traced back 
to Kass et al. 



22| . They defined an energy functional 



of the object boundary and solved the functional 
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optimization problem using the calculus of varia- 
tion (Euler-Lagragian equation). The resulting par- 
tial differential equation (PDE) was solved by the 
finite difference method that was iterated until con- 
vergence. 

Over the past two decades, a great deal of liter- 
ature has been published and considerable advances 
have been made in the field of contour tracking. 
In terms of the type of information used, contour 
tracking methods can be roughly divided into three 
classes: boundary-based [sl,!^ which depends on im- 
age features along the object contour, region-based 



39, 



47| which prefers the information within the 



region enclosed by the object boundary, and those 
based on combination of the both [59]. From the 
perspective of contour representation, two kinds of 
methods can be classified: parametric contour meth- 
ods [4^ where the contours are explicitly represented 



by the parametric curves, and geometric contour ap- 
proaches [sl, where the contours are implicitly rep- 
resented by zero level sets of some high dimensional 
functions [37| . The former often has fast convergence 
speed but provides less flexibility in handling curve 
topology, whereas the latter tends to be computa- 
tionally expensive but can naturally handle topolog- 
ical deformation. 

Despite the varying categorizations of and the 
different techniques used in contour tracking meth- 
ods, three of the key issues are required to be re- 
searched: (a) how to formulate functionals and how 
to optimize them; (b) how to seek effective features 
for object representation; and (c) how to develop a 
robust and accurate tracking algorithm. 

As regards functional formulation, the classical 
■nethod. Ha define, he energy functionals that 
involve terms of image features and the smoothness 
constraint on the boundary curves. Other typical ap- 
proaches are based on the Bayesian inference of re- 
gion segmentation JiqU39I , or based on the Mumford- 
Shah functionals [6|, |46|. In recent years, a class of 
functionals has been presented independently that 
measures the similarity or distance between the tar- 

The common 



methods for functional optimization use the calculus 
of variations to derive the corresponding PDEs. For 
some complex functionals involving region terms, de- 
riving the corresponding PDEs via the calculus of 
variation is nontrivial because Green's theorem has 
to be used to transform the region functionals into 
boundary functionals ll | . However, using the theory 
of shape derivatives [50|, we can deal with such re- 
gion functionals in a straightforward and principled 
manner 



0. 



get and candidate models 
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The Earth Mover's Distance (EMD) [48|^nown 
as the Wasserstein distance in mathematics [45] , has 
been proven to be a robust distance measure be- 
tween two distributions, outperforming commonly 
used probability measures such as Jeffrey divergence, 

statistics, and Li distance. Hence, in this paper, 
we formulate a region functional based on EMD with 
kernel density to represent the distributions for con- 
tour tracking. Optimizing this functional is nontriv- 
ial, and thus we propose a two-phase method for its 
optimization. In the first phase, assuming the can- 
didate shape is fixed, we express the EMD as the 
transportation pro blem and solve it using the sim- 
plex algorithm [30|. In the second phase, using the 
theory of shape derivative, we make a perturbation 
analysis of the candidate contour around the best 
solution to the transportation problem. Thus, we de- 
rive the PDE that governs the contour evolution. 

The features often used in contour tracking are 
image gradient, color, texture, optical flows, or inter- 
frame difference. Earlier works have focused on gra- 
dients HQ of the boundary delineating the ob- 
ject and the background. However, in cluttered im- 
ages or texture images, the object boundary may 
be undistinguishable from the background. In addi- 
tion, the image gradients are susceptible to noises 
present in the image. Probabilistic modeling of color 
is very common in the contour tracking papers [l6| . 
As color features are sensitive to illumination vari- 
ations, an online model update is usually required 
for robust tracking [3l|. Under the assumptions of 
brightness constancy and smoothness [15], optical 
flows describing the apparent motion of appearance 



Tensor-SIFT based Earth Mover's Distance for Contour Tracking 



3 



can be effectively used [18|. For videos captured by 
a still camera, inter-frame differences are useful for 
moving object tracking [ssl, whereas the global mo- 
tion of the camera has to be estimated to compen- 
sate for the object motion [iqI for videos captured 
by mobile cameras. For enhancing the discriminating 
power of image features, Brox et al. Q investigated 
combination of color, texture, and motion in the ac- 
tive contour framework. 

This paper proposes a novel and effective feature 
called tensor-SIFT for contour tracking application. 
SIFT features 0, S have proven to be very effec- 
tive in describing local image characteristics. How- 
ever, the high dimensionality of SIFT features limits 
its application in tracking fields. Hence, we introduce 
a tensor decomposition method for its dimensional- 
ity reduction. The resulting low dimensional feature, 
which is applicable to both color and gray-level im- 
ages, is very distinctive and insensitive to illumina- 
tion change and noise. The idea is that we regard 
SIFT features as a tensor (SIFT- As- Tensor), and 
thus tensor decomposition [23| can be applied which 
captures multi-factor (i.e., two-dimensional spatial lay- 
out and phase histogram) relationships inherently 
present in the image data [53|, |5J. In contrast, the 
methods of vectorizing SIFT features (SIFT- As- Vector) 
can only capture one-factor of the histogram infor- 
mation while losing the two-dimensional spatial lay- 
out in the data. 

Based on the functional formulation and Tensor- 
SIFT described previously, we finally develop an in- 
tegrated contour tracking algorithm. This algorithm 
combines multiple techniques intended for effective 
contour evolution, including the simplex algorithm 
[30! , and the level set algorithm, and the re- initialization 



The criterion fits the most recent EMD values with 
straight lines, and terminates the iteration when the 
line slope does not decrease any more. The itera- 
tion is also halted if the area variation between two 
consecutive frames is large, considering the spatio- 
temporal continuity. 

The remainder of the paper is organized as fol- 
lows. Section [2] begins with an overview of the lit- 
erature related to the paper. Section [3] explores the 
novel feature of Tensor-SIFT. Section H] formulates 
the region functional and proposes a two-phase method 
for its optimization. Section [5] introduces an inte- 
grated contour tracking algorithm and analyzes its 
computational cost. Section [6] presents the exper- 
iments and the corresponding discussions. Finally, 
the concluding remarks are given in section [71 



2 Related Work 

This section reviews the studies that are related to 
this paper, including sudies on the EMD, on SIFT 
features, and on tensor decomposition. 



2.1 Transportation Problem and EMD 

The transportation problem, also known as the Monge- 
Kantorivich problem, can be traced back to the 18th 
century [ssl; it was reformulated and extended by 



algorithm of fast marching [36|, 49|. Particularly, we 
introduce an inter- frame initialization scheme. This 
scheme exploits the mean shift iteration to initial- 
ize the current frame by the tracking result at the 
previous frame. It can provide a more accurate ini- 
tial contour and is helpful in avoiding possible lo- 
cal minimum in the functional. We also introduce a 
stopping criterion for terminating the PDE iteration. 



Kantorovich [2l| as a minimal distance problem (called 
Wasserstein distance) between two probability mea- 
sures. Over the past years, it has been widely inves- 
tigated and has found a variety of applications in 
many fields [45]. The discrete transportation prob- 
lem is well studied in linear programming and can 
be solved by the simplex algorithm [sol . 

One of the first papers published is Peleg et al. 
[43], which introduced the transportation problem 
into image processing. The authors measured the 
distance between two gray-level images for image 
matching. Haker et al. [17] computed image reg- 
istration and warping maps based on the Monge- 
Kantorovich theory of optimal mass transport and 
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developed an efficient PDE for solving the Wasser- 
stein distance. Rubner et al. [48] proposed a cross-bin 
probability measure, EMD, for distributions match- 
ing in content-based image retrieval. They showed 
the distinct advantages of the EMD over the other 
commonly used point- wise probability measures Q . 
As EMD computation in high dimensional cases is 
computationally expensive, recent research efforts are 
devoted to presenting new EMD variants and/or to 
developing fast algorithms for EMD computation [26, 



41 



42| . In the one-dimensional case the closed- form 



solution of EMD (Wasserstein distance) is used in 
a region-based contour model for image segmenta- 
tion [35I. Application of EMD to object tracking 
was studied by Zhao et al. [ssl. In this study, by de- 
scribing the object shapes with ellipses, they adopted 
EMD to compare the reference and candidate mod- 
els, and proposed an efficient gradient descent method 
called differential EMD (DEMD) to estimate the 
translation of elliptical objects. 



Our work is similar to that of Ni et al. [35| in the 
sense that both nonparametric density and region- 
based Wasserstein distance are used in the active 
contour framework. The main difference is that in a 
multiple-dimensional case such as ours, the closed- 
form solution of the EMD does not exist, and there- 
fore the method of Ni et al. [35] is not applicable to 
our problem. Our work is also motivated by [58| ; the 
primary distinction is that our focus is on how to fol- 
low the complex contour shape and non-rigid defor- 
mation rather than the simple elliptical shapes with 
only the motion of translation. This naturally leads 
to a different paradigm-functional formulation and 
its optimization through PDE. Establishing region 
functionals by probability similarity or distance mea- 
sures has also been investigated by previous studies 
16, 20, 5^. However, our work differs from these in 
the probability measures adopted: we argue that for 
contour tracking problems EMD is more robust than 
other commonly used point-wise probability mea- 
sures. 



2.2 SIFT Features 



The SIFT algorithm includes two main steps: key - 



28|, l2c 



point detection and keypoint description 
The first step consists of local extrema (called key- 
points) detection, followed by keypoints localization 
and the dominant orientation determination. In this 
way, generally sparse keypoints are identified. In the 
second step, each keypoint is represented by phase 
histograms of pixel gradient magnitudes (hereafter 
called SIFT feature) in the image patch centered at 
the keypoint. The SIFT algorithm has demonstrated 
successful applications in object recognition 28, 3^ , 
image classification 11 1, image retrieval and so 
on. 

On the other hand, dense SIFT features com- 
puted at regular grid points, without the preceding 
step of keypoint detection, have also shown promis- 
ing performance in scene category recognition 1241 , 
25| or in aligning images of complex scenes [27|. 
Because SIFT features are of high dimensionality, 
it is desirable to get a low-dimensional description 
for both computational efficiency and feasible prob- 
abilistic modeling (e.g., histogram). In the bag-of- 
feature literature, the common practice is to reduce 
the dimensionality of SIFT feature by creating code- 
words through clustering algorithms followed by in- 
dex histogram 24, 25|. Dimensionality reduction of 
SIFT features by PGA is also a natural choice 0. 
Rather than using the SIFT descriptor proposed by 
Lowe [29], Yan and Sukthankar [56| described the 
characteristics of an image patch by concatenating 
into a vector the horizontal and vertical gradients of 
each image pixel in this patch and then used PCA 
for dimensionality reduction. 

However, a SIFT feature is actually a 4x4x8 his- 
togram array of gradient magnitudes; its locally two- 
dimensional spatial information inherently present 
is lost whe n p acked as a 128-element vector 
25 



27 



29 



32 
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[3^ . This analysis motivates us to 
exploit tensor decomposition for dimensionality re- 
duction of SIFT features, which distinguishes our 
work from the previous ones. As tensor decompo- 
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sition method can capture multi-factor relationships 
inherently present in data, the proposed low- dimen- 
sional feature, Tensor-SIFT, is more powerful in cap- 
turing both spatial layout and histogram informa- 
tion in describing the patch characteristics. 



2.3 Tensor Decomposition 

A tensor is a high-dimensional array, and can be 
considered a generalization of a vector (first-order 
tensor) and a matrix (second-order tensor). Tensor 
decomposition is one of the topics in multi-linear al- 
gebra, which reveals that a higher-order tensor is 
formed by a confluence of multiple factors, and its 
decomposition consists in exploring these multiple 
factors inherently present in data. Vasilescu and Ter- 
zopoulos [52| proposed a method of TensorFaces for 
face recognition involving multiple factors, such as 
facial geometry, expression, pose, and illumination. 
They used a technique of N- modes SVD, a multi- 
linear extension to the classical matrix SVD, achiev- 



ing significantly better recognition rates over the clas- 
sical method of eigenfaces [5l|, which is dependent 
on linear PCA. Based on N-modes SVD, they also 
demonstrated effective dimensionality reduction in 
facial image ensembles [ssl. Wang and Ahuja 
presented a dimensionality reduction approach based 
on tensor-decomposition for effectively capturing the 
spatial and temporal redundancies. Their method 
achieved the most compact data representations among 
the state of the art 1541. 



3.1 SIFT Feature as a Multi-Dimensional Array 

The computation of SIFT is described briefly as fol- 
lows [refer to Lowe |29| for details]. First, for the 
interest point, we determine an 8x8 sampling image 
window centered at this point and then compute the 
gradient magnitude weighted by a Gaussian function 
and phase of every pixel. Next, the sampling window 
is regularly divided into 4x4 sub- windows; in each 
subwindow [0, 27r] is uniformly partitioned into eight 
intervals, and the corresponding phase histogram is 
computed. To avoid boundary effects, trilinear inter- 
polation is exploited to distribute the value of each 
gradient sample into the neighboring phase bins. 



Fig. 1(a) shows an 8x8 image window. Fig. 1(b) 
presents the corresponding phase histograms com- 
puted in this window, where the arrow direction is 
the histogram bin indicator, and its length signifies 
the histogram bin value. Traditionally, this 4x4 his- 
togram array is packed into a vector of 128 as shown 



in Fig. 1(c) As described previously, one SIFT fea- 
ture describes the local region characteristics associ- 
ated with the center pixel, including both the sub- 
windows' spatial layout and their phase histograms. 
Vectorization of SIFT feature (SIFT- As- Vector) re- 
sults in the loss of spatial layout inherently present in 
data. Clearly, the data of 4x4x8 array are produced 
by the confluence of multiple factors, i.e., 2D spatial 
layout and histogram. Thus, the best way is to see 
the data "as is". This naturally leads to our view 
of "SIFT- As-Tensor" : one SIFT feature is a third- 
order tensor, and a set of SIFT features is thus a 
fourth-order tensor. 



3 Tensor-SIFT: Dimensionality Reduction of 
SIFT Features by Tensor Decomposition 

In this section, we first interpret the SIFT feature 
from two different views: SIFT- As- Tensor and SIFT- 
As- Vector. We then describe the method of tensor- 
decomposition for dimensionality reduction of SIFT 
feature, achieving our novel feature of Tensor-SIFT. 



3.2 Dimensionality Reduction by Tensor 
Decomposition 

An A^th-order or A/"- way tensor X is an A/"- dimensional 
array. Mathematically, it is an element of the direct 
product of N vector spaces, i.e., X G R^ix^sx - x/at^ 
where J^, n = 1, . . . , A", denotes the size of dimen- 
sion n. A tensor is rank-one if it can be written as the 
outer product of N vectors. For easy mathematical 
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Fig. 1 Illustration of one SIFT feature as a multi-dimensional array vs as a vector. An 8 by 8 sampling image window 
(a) is regularly divided into 4 by 4 sub- windows. For each sub- window an 8-bin phase histogram of gradient magnitudes 
is computed, producing a three-dimensional, 4 by 4 by 8, array (b), where the arrow direction is the histogram bin 
indicator, and its length signifies the histogram bin value. Traditionally, this three-dimensional array is vectorized into a 
one-dimensional vector of 128 (c), resulting in the loss of two-dimensional spatial layout inherently present in the data. 

manipulation a tensor is often flattened or unfolded where || • || denotes the tensor norm, i.e., the sum of 



into a matrix. The mode-n matrix of a A/'-way ten- 
sor is a In by (/i • • • /n-i^+i • • • ^at) matrix X'^''), 
where the tensor element indexed by (ii, ^2, • • • , ^at) 
is mapped to the matrix element indexed by (in, j) 
where 



N 



k-1 



j = l + ^(i,-l) H Irr 



m—1 



The idea of CANDECOMP/PARAFAC (CP) de- 
composition [ist is to approximate a tensor with a 
sum of component rank-one tensors. In our case, 
a set of SIFT features !J is a fourth-order tensor, 
^ ]^/ix/2x/3x/4^ where h is the number of SIFT 
features, /2 = 4, /s = 4 denote the numbers of hori- 
zontal and vertical sub- windows in the sampling im- 
age window, respectively, and /4 = 8 denotes the 
histogram bin number. Its CP decomposition can be 
described by 



argmin \\3^ - 

3" 
K 



n 



J = ffc o rfc o Sfc o tfc 



(1) 



the square of each element in the tensor, o denotes 



outer product, G 



R^^, and K is the number of component rank-one 
tensors. 

Denoting F = [fi ... f^] and likewise for R, S 
and T, the minimization problem becomes how to 
seek the above four matrices that satisfy Jp. We use 
alternating least square (ALS) algorithm [23| for its 
solution; i.e., we alternatively fix three of the four 
matrices, solving for the remaining one. We repeat 
this procedure until the error is less than a threshold 
or the maximum number of iterations is reached. The 
problem is reduced to a least square problem when 
fixing all but one matrix. For instance, let R, S, and 
T be fixed. The minimization problem is reduced to 
the least square problem in matrix form: 

argmin HF^^^ - F(R S TfWr 

F 



(2) 



where F^^^ is the mode-n matrix of the tensor 9^, 
II • \\f stands for the matrix Frobenius norm, and 
denotes the Khatri-Rao product of two matrices. 
The Khatri-Rao product is defined as the column- 
wise Kronecker product of two matrices, e.g., R0 S 
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is a matrix of 12/3 by K\ 



4.1 Formulation of the Region Functional 



R S = [ri (g) si ... yk^sk] 

where denotes the kronecker product. The solution 
to (j2j) can be given conveniently by the formula 



Tit 



F = F(^)[(R0S0T)^] 



(3) 



where f denotes the Moore-Penrose pseudo-inverse. 

Given a reference image, we compute the SIFT 
feature for every image pixel and arrange all SIFT 
features into a four-dimensional array, or a four- way 
tensor, !J G ]R^i><^2x/3x/4^ ^f^-g^ performing its CP 
decomposition by the ALS algorithm described pre- 
viously, the Ii by K matrix F can be considered 
the reference Tensor-SIFT of reduced dimensional- 
ity, and I2 by K matrix R, by K matrix S as 
well as /4 by K matrix R can be regarded as the 
tensor "basis matrices" . When a new image is avail- 
able, we compute the dense SIFT features for every 
pixel; each SIFT feature is a 1x4x4x8 tensor that 
can be approximated by projection (|3]). The resulting 
Ihy K vector is the corresponding Tensor-SIFT of 
reduced dimensionality. Fig. [2] shows an example of 



the tensor decomposition (i^T = 3), where Fig. 2(a) 



shows a reference image (left), the scatter map of 
its tensor-SIFT (middle), and the tensor-SIFT im- 
age (right). Fig |2(b)| shows four images (top row) 
and the corresponding tensor-SIFT images (bottom 
row). Note that the values of the Tensor-SIFT are 
scaled to [0, 255] for each dimension. 



4 Region Functional Formulation and Its 
Optimization 

This section begins with the formulation of the re- 
gion functional based on EMD (section l^?T]) and then 
introduces a two-phase method for the functional op- 
timization: the first-phase for solving EMD by the 
simplex algorithm (section 14. 2p and the second for 
PDF derivation by shape derivative (section lO]) . 



Following Rubner et al. [48], the reference and can- 
didate models are denoted by Signatatures (h*,p^) 
for = 1, • • • , [/ and (h^, qy) for = 1, • • • , V, re- 
spectively, where h* (h^) and Pu {qv) denote the 
uth subspace center and the corresponding distri- 
bution in the reference (candidate) object feature 
space. The distance between the reference and can- 
didate models may be interpreted as the classical 
transportation problem that can be solved by the 
simplex algorithm. 

There are many methods for distribution model- 
ing, either parametric or non-parametric. In this pa- 
per we use the non-parametric kernel method [8] be- 
cause it does not assume any prior distribution of the 
data and considers the (weak) spatial information. 
Suppose that the feature space of the reference ob- 
ject is divided into U subspaces (or histogram bins), 
and the object is described by a complex image re- 
gion with a boundary F*. Suppose that the object 
center is at the coordinate origin. Let z* = [x* y^]^ 
be a point in the region 1]* with feature vector I(z*), 
the reference distribution for = 1, •••,[/ of the 
object can be represented by 

1 



Pu 



^ ^(Z*)^n(l(z*)) 



(4) 



where w{-) is a kernel function, and (5^(I(z*)) denotes 
the Kronecker delta function that equals 1 if I(z*) 
belongs to the uth subspace, and otherwise equals 
0. Note that the kernel function weights the point's 
contribution: one point that is closer to the object 
center makes more contribution than one that is far 
away. 

In the course of tracking the object shape may 
undergo non-rigid deformation. Let ft denote the 
candidate region of the object with boundary F, its 
center Zc = {xc^Vc) may be computed as 
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Fig. 2 Example of the Tensor-SIFT (K = 3). (a) The dense SIFT features are computed in the reference image (left), 
and by the ALS algorithm the three-dimensional Tensor-SIFT are produced; the corresponding scatter map (middle) 
and tensor-SIFT image (right) are shown, (b) Given four new images (top row), the corresponding Tensor-SIFTs are 
computed and the tensor-SIFT images are shown (bottom row). Note that the values of the Tensor-SIFTs are scaled to 
[0, 255] for each dimension. 

where denotes the number of points in the region 
ft. Then the candidate distribution qv{^) for v = 
1, • • • , y can be described by 



u V 



(7) 



u—l v=l 



subject to 



qy{n) 



w{zi-Zc)Sy{l{zi)) (5) 



Note that the symbols in the above equation have 
similar meanings to their counterparts in (jH), and V 
indicates that the candidate object is divided into V 
subspaces. 

The region functional that measures the distance 
(dissimilarity) between the reference and candidate 
models is defined as follows: 



E 

^ — ^u=l ^ — ' 
^uv U = l 



V 



min(y^ p 



^ — ^v=l 



argniin f{Q) 



(6) 



Note that in our problem, U = V, and the kernel 
densities are normalized so that the last constraint 
equation is redundant; 2) U=V. In the above equa- 
tions the ground distance duv between the clusters 
u^v may be interpreted as the cost in transporting 
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the unit goods from the provider u to the consumer 
V. Thus, the tracking problem can be formulated as 
how to seek a candidate region Q with a complex 
shape r, so that its distribution {qy{Q)^v = !,••• , V} 
has the smallest distance with the reference distri- 
bution {pu^ = 1, • • • ,17} . 

Optimizing the region functional (j6]) is nontrivial. 
We propose a two-phase method for the functional 
formulation. In the first phase, let the candidate con- 
tour be fixed. EMD is computed as the best solution 
to the transportation problem using the simplex al- 
gorithm. Next, using the shape derivative theory, we 
make a perturbation analysis of the contour around 
the best solution, thus deriving the corresponding 
PDE that governs the candidate contour evolution. 
Note that a similar two-phase method is used for 
functional minimization in |6„ Sec. 2.2]. 



4.2 Simplex Algorithm for Solving EMD 

Let the image region Q be fixed. The problem de- 
scribed by (O becomes the classical transportation 
problem, which can be solved by the simplex algo- 
rithm. Equation ([7]) can be expressed in matrix form 
as follows (the subscript Q is omitted because Q is 
fixed) : 



min / = c"^x 

X 

subject to Ax = b 

X > (8) 



Here c, x are both vectors of UV, b is a vector of 
+ V + 1, and A is a matrix of /7 + V + 1 by UV; 



they have the following forms, respectively: 



c = 


dii di2 • 


• • div 
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In the matrix A all the elements e^j = 1, i = 1, • • • , /7, 

j = i,---,v. 

Let X = [xb xn] be an initial feasible solution, 
where xb and xn denote the vectors of basic vari- 
ables and non-basic variables, respectively. Accord- 
ingly, the vector c can be written as c = [cb cn]. 
In a similar manner, the matrix A can be written as 
A = [B N], where B, comprising columns in A that 
correspond to the basic variables, is a basis of the 
Euclidean space W^{m = U + F + 1), and N con- 
sists of columns in A corresponding to the non-basic 
variables. 

Starting from the initial feasible solution, the sim- 
plex algorithm proceeds through an iteration pro- 
cess. This process can be explained in tableau form. 
Table [T] illustrates the initial simplex tableau, where 

and I denote the zero vector and unitary vector, 
respectively. Each iteration seeks an improved feasi- 
ble solution that decreases the value of the function 
/. Consider one iteration. The most negative com- 
ponent of = CgB~^N — < is determined, 
and the corresponding column in N, denoted by Nj, 
is selected as the vector to enter the basis matrix B. 
Next, y = B~^Nj and b = B~-^b are computed and 
then the pivoting row index i is decided for which 

1 = max{?//c/6/e, ^fe > 0}, where yk and bk are com- 

k 

ponents of y and b, respectively. After the pivot ele- 
ment is determined, we can perform Gaussian elimi- 
nation to update B~^. The column vector in B that 
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Table 1 The simplex tableau 





/ xg 


T 


XB 


I 


B-iN B-ib 


/ 


1 -eg 


cgB-iN-c^ cgB-ib 



corresponds to the zth basic variable of xb is selected 
to leave the basis matrix B. 

Once > is identified during the iterations, 
the best solution to the transportation problem is 
obtained, and the iteration should be terminated. 
The EMD between the candidate and reference mod- 
els is given by 



/(l^) = c^B-^b 

V u 



(9) 



V = l 



where 



. Cb(i)B" V = I,-- - , V 

i—i 

Cb(i)B-^(i,^i + V), u = l,--- ,U 

1=1 

Cb(0B-^(i,/7 + F + l) 



Here Cb(^) denotes the zth component in the vector, 
and B~^(z, v) the entry of row i and column v in the 
matrix. 

The best solution always exists for the balanced 
transportation problem [30]. Note that in such case, 
the matrix A is not full-rank (U + V — 1 in our 
case). We introduce artificial variables and obtain 
the initial feasible solution according to the method 
described in Luenberger and Ye [sol, PP- 50-54]. 



4.3 PDE Derivation with Theory of Shape 
Derivative 

After the first phase is completed, we can perform a 
perturbation analysis around the best solution to the 
transportation problem. That is, let 1^ be a variable 
and solve the functional ([9]). The second and third 
terms on the right-hand side of (|9]) are irrelevant 
to and therefore can be be discarded. Hence, the 



functional to be optimized becomes 

V 



(10) 



Deriving the PDE associated with this functional is 
not straightforward. Here the theory of shape deriva- 
tive is used. 

4-3.1 Shape Derivative 

Given an initial domain (region) the set of all the 
possible deformations may not be a vector space. 
The theory of shape derivative introduces a family 
of deformations (transformations) {1^^-, r > 0} such 
that the perturbation of Q is possible with respect 
to r. 

Suppose that the perturbation of a point z e ft 
is governed by the differential equation 

dz(r) 



dt 



V(z(r)), r >0, z(0) =z 



(11) 



where V is a vector field. We can define the trans- 
formations of a point and the region as follows: 



T(r,z) = z(r), r > 0, z(0) = z 
ftr =Tr{n) = {T(r,z), zeQ} 



(12) 



The region functional J{Q) = (/>(^) is said to 
have a Eulerian derivative or a shape derivative^ at Vt 
in the direction of the vector field V if the following 
limit exists and is finite 



dJ(Vt', V) = lim 



(13) 



The functional JiQ) is shape differentiable at VL if 
the Eulerian derivative dJ{Q] V) exists for all direc- 
tions V and the mapping V J{^}; V) is linear and 
continuous. 

The material derivative of a function (/)(1^) at ft 
in the direction of the vector field V is defined as 
0(1]^) oT(r,z) -(/)(!]) 



(I){n; V) = lim 



(14) 



And the shape derivative of a function (j){ft) at ft in 
the direction of the vector field V is defined as 



i)Uft;V) = lim 



(t){ftr)oT{0,z)-(t){ft) 



(15) 
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The following theorem establishes the relation 
between the Eulerian derivative and the shape deriva- 
tives: 

Theorem 1 (Theorem on the Eulerian and shape 
derivatives) // the material and shape derivatives of 
a function exist, then the functional J{rt) = 

^(^^) is shape differentiahle and the following equa- 
tion holds 

dJ{n;V)= [ (l)\n;V)dz- [ (l){n){V,N)ds (16) 
Jq Jr 

where (V, N) denotes the inner product between V 
and N. 

For the complete and strict definitions of and the- 
ories about shape derivative, refer to 50 1 and [l2|. A 
brief yet insightful introduction about shape deriva- 
tive is given in Aubert et al. [l|. 



4.3.2 Derivation of the PDE through Shape 
Derivative 

Our derivation of the PDE to optimize the region 
functional builds upon theorem 1. First we write the 
candidate distribution in a continuous form: 

qy{Vt) = — - / ^(z - Zc)^^(I(z))dz 

(17) 

where the Gaussian kernel w{z — Zc) = exp(— ||z — 
Zc|p/(2<j^)) is adopted, and the object center Zc = 
[xc Vc]^ takes the form 

To facilitate derivation, we write Xc^yc in the form 

^- = S4hI' Gl(^^)= / xdz, Go(fi)= / dz 
<J^o("j Jq Jn 

G,ia) = [ ydz (18) 



We also re-express the candidate distribution as fol- 
lows: 



qy{n) 



Ki{n)= [ Li(l])dz, Li{n) = w{z - Zc)Sy{l{z)) 
Jq 

K2{n)= f L2{n))dz, L2{n) = w{z - Zc)) (i9) 

The Eulerian derivative of the region functional 
(p^Oj) is given by the formula 



df{n;V)=Y.Uq,{n;V) 



(20) 



V 



dK2 



where dq^/dKi is the partial derivative of q^ with 
respect to Ki and dqv/dK2 has a similar meaning. 

Now consider the dKi{Vt] V) in ([2Q|). From theo- 
rem 1 we have 

dKi{Q.',\)= I L\{Q.',\)dz- I Li{n){V,N)ds 
Jn Jr 

(21) 

From (p!8|) and (p!9|) , Li is a function of Go, Gi and 
G2 which are all functionals of Q. Hence, using the 
chain rule of the shape derivative we get 

L[{n;V) = |^dGo(^^;V) + |^dGl(^^;V) 



dGo 
dLi 

dG2 



dGi 



dG2{^l;V) 



(22) 



The shape derivatives of the integrands in Go, Gi, 
and Go are all zero because they are independent of 
Hence, application of theorem 1 to Go, Gi, and 
Go, respectively, leads to the following equations: 



dGo{n;\) = - J (V,N)ds 



Vc 



Go{n)'' 



dGi{n;\) 
dG2{n;V) 



x{\,N)ds 
y{V,N)ds 



(23) 
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Note that the partial derivatives of Li with respect 
to Go, Gi and G2 are in the fohowing forms: 

{{X - Xc)Xc + - ^c)^c)(^^(I(z)) 



dLi 


W'{z - Zc) 


dGo ~ 


a^Go 


dLi 


W'{Z - Zc) 


dGi ~ 


'J^Go ' 


dLi 


W'{Z - Zc) 


dG2 ~ 


a^Go ' 



{X - Xc)Sy{l{z)) 

{y - yc)^v{l{z)) 



(24) 



Combining (|2T]), (|22|), (|23|), and (|24l) and after some 
manipulation, we obtain 



(z - Zc)^(z - Zc)i(;'(z - Zc) 



(5^(I(z))dz - w{z - Zc)Sy{I{z))^ (V,N)ds (25) 

where z = [x ^]-^ is a point in the region and 
z = [x y]^ is di point on the boundary T. 
Likewise, we can derive 

[ ( ¥Fr [ {z-Zc)^{z-Zc)w\z-Zc)dz 

Jr \ cr^o Jn 

-^(z-Zc))(V,N)ds (26) 



Note that dq^/dKi = 1 / K2anddqy / 8X2 = qv/K2. 
Substituting (|25|) and (|26|) into (|20|), we have the 
complete expression of the Eulerian derivative of the 
region functional f{^): 



df{n;V) = I F{V,N)ds 



where F is of the following form 

^ = - 2n ^ / ~ ~ Zc)w\z - Zc) 

^ly{6y{l{z)) - Qy) j dZ 



v=l 



- ^w{z -z,)(Y1 ^Ml{z)) - qy)] (28) 

^ ^ y — l ^ 

From the above shape derivatives, we obtain the 
PDE associated with the region functional (p^Oj) gov- 
erning the contour evolution 



dr 

dr 



FN 



(29) 



The preceding PDE is obtained when the nor- 
mal kernel is employed. If other kernels are used, we 
can derive the corresponding PDEs in a similar man- 
ner. Table [2] lists F with respect to the normal ker- 
nel, Epanechnikov kernel and uniform kernel, respec- 
tively. The first term in F, when the non-uniform 
kernels are used, is due to the spatial weight intro- 
duced, reflecting the combined effect of one point z 
on the contour and the points within the region. Our 
experiments show that the normal kernel performs 
well and thus it is used in this paper. The parame- 
ter (J is selected as half of the radius of the minimal 
enclosing circle of the contour. 

Similar to Aubert et al. []]], we also impose a 
smooth constraint on the contour by adding a second 
term a J^^ds in Eq. ([6]), where a is a constant. The 
minimization of this curve length term gives rise to a 
curvature term in the PDE [5[. Thus, our final PDE 
has the following form [l|: 

dT 



dr 



{F + af^)N 



(30) 



where is the curvature of the curve F. 

4.3.3 Solution to PDE through the Level Set Method 

The idea of the level set method is that at any time 
the contour F is implicitly represented by a zero level 
set of a higher-dimensional function 0, i.e.. 



(27) F(z,r) = {z : 0(z,r) = 0}, given F(z,0) 

We define 0(z, r) < in the interior region and 
0(z, r) > in the exterior region. Taking the par- 
tial derivative of ^ with respect to r and using the 
chain rule, we have 



del) dT ^ 



(31) 



where = [(j)x (j^y]^ is the gradient of (j) with re- 
spect to z. Note that N = -V0/||V0|| and 1^ = 
— div (V(^/|| V(^||), where div denotes the divergence. 
Substituting (|3Q]) into (|3T]) . we have the following 
PDE: 



V0 A 

\m\J 



\m 



(32) 
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Table 2 The function F with respect to various kernel 



Kernel 



The function F 



Normal kernel 

«,jv(i) = exp(-Mz|^) ien 



Epanechnikov Kernel 
1 - II 





we{z) ■■ 



^^||2 zen, ll^^ll < 1 

otherwise 



Uniform Kernel 

1 ien 



wu 



otherwise 



^ fjz - z.)^(z - Ze) (Er=i lv{Sy{m) - qy)) d~z 
-^we(z - Zc) (j2v=l lv(Sv(I(z)) - qv)^ 



1 f ^ 

— [Y,lM\(z)) - qy) 



Applying to (|32]) the first-order forward time dif- 
ference scheme, the upwind scheme for the first-order 
discretization of F, and the first and second-order 
central difference schemes for the curvature we 
have the following discrete equation: 



:(/>[,^.+ Ar (max(i^^.,0)2V 
-min(i^^.,0)^V^" +a< 



r+ 



(33) 



where <^\j denotes the function value on the grid 
point at iteration step t, I\t denotes the dis- 

crete time interval, and V^"*", V^" and take the 
following forms, respectively: 



= (max(Z)-/,0)2 

+ max(D-J,0)2 4- 
= (max(D+/,0)2 

+ max(Z)+/,0)2 + 



+ min(D+;,0) 
min(Z)+/,0)2)'/' 
+ min(D-/,0)2 



•)xy 

'■J 



Here a short-hand notation is used where the oper- 
ator D~^(j)Jj is written as ^ etc. The operators 
involved are defined as follows: 



D 
D 



Ox 



D, 



-\-x+y 



^r-i,,-i)/2 



The operators D^'j , Djj, D-^- and Djj'^ have sim- 
ilar forms. At each iteration step r, numerical sta- 
bility can be guaranteed by the following CFL con- 
dition H 



Ar max 



^ abs(I^+/) + abs(I^,^/) 

(^0.)2)l/2 



4a > < 1 



(34) 



The implicit function (j) can be arbitrary and is 
usually chosen as a signed distance function 0(z, r) = 
d{z^r). As suggested [36|, ch.7], in the level set algo- 
rithm, the re-initialization technique is required pe- 
riodically so that 0(z, r) still remains a valid signed 
distance function during the evolution process. We 
use the fast marching method proposed by Sethian 
49|, ch.8] for re- initialization and the re-initialization 
frequency of 50 suffices for all of our experiments. 



5 Integrated Contour Tracking Algorithm 

This section describes the integrated tracking algo- 
rithm (section [5. ip . followed by the discussion of the 
implementation detail (section 15. 2p and the compu- 
tational cost analysis (section [53]) . 



5.1 Integrated Tracking Algorithm 

The level set algorithm is computationally expen- 
sive. In tracking applications, because we are only 
concerned with the object contour, i.e., the zero level 
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set, we update in each iteration only the values of 
the level set function in a narrow band in the neigh- 
boring region around the current zero level set. Re- 
initialization of the level set function using the fast 
marching algorithm is also performed within the nar- 
row band. 

In contour tracking algorithms, it is not straight- 
forward to exploit temporal continuity. Rathi et al. 
46| proposed a method in the particle filter frame- 
work. However, the high computational cost of the 
particle filter plus that of the contour evolution us- 
ing the level set algorithm may be a very heavy load. 
We introduce a scheme that uses the mean shift al- 
gorithm [st for exploiting temporal continuity, which 
provides inter-frame contour initialization. In frame 

t, before PDE iteration starts, we first use an ellipse 

I ! 

fitting method [14] to fit the contour — 1), i.e., 
the tracking result in frame t — 1. Next, we perform 
the mean shift tracking algorithm until its conver- 
gence and then enlarge the major and minor radii of 
the resulting ellipse by twenty percent, respectively. 
Let (xo, ^0, ^7 ^) be the parameters of the resulting 
ellipse, where (xo,^o) denotes the ellipse center, a, 6 
denote the radius along x— and y— axes, and 6 in- 
dicates its orientation. We construct the initial level 
set function through the following formula 



{{i - xq) cos 0^{j - yo) sin 0^ 

{{i - Xq) sin - {j - yo) cos 0)'^ 
62 



1 



This scheme gives a kind of contour measurement, 
taking advantage of temporal continuity and thus 
providing a more accurate initial contour that helps 
avoid possible local minimum in gradient descent it- 
eration. 

There is no general stopping criterion in the level 
set algorithm and the fixed number of iterations is 
often used. As EMD is robust in measuring the dis- 
tance between the target and candidate models, we 
introduce a stopping criterion that ends the iteration 
if the EMD does not decrease. Practically, we keep 
the latest 20 values of EMD to which a straight line 
is fitted by the least square method, and terminate 
the iteration process if the line slope is larger than 



or equal to zero. Due to temporal continuity, the 
object size change may not be large. Thus, we also 
terminate the iteration if the area difference between 
Q*{t — 1) an Qr{t) is larger than ten percent. 

Given the primary considerations above, we now 
describe the complete tracking algorithm in Algo- 
rithm [H 

5.2 Implementation Detail 

This section discusses the implementation detail in 
the tracking algorithm. 

Selection of K in tensor decomposition The object 
size is usually small in a tracking task; thus, we 
select 8x8 sampling window in the SIFT feature 
computation. In the tensor decomposition problem, 
what K best fits the original tensor is the prob- 
lem of the best rank-/c approximation that is un- 
der investigation Q, which is beyond the scope of 
this paper. Intuitively, larger K may better explain 
the original tensor, which, however, also means the 
larger dimensionality of tensor-SIFT. In this paper 
we demonstrate conceptually our idea by simply set- 
ting K = Z. 

Parameters setting in inter-frame initialization Inter- 
frame initialization mainly concerns the mean shift 
iteration. Following the histogram of 16x16x16 
bins and the maximum iteration number 10 are used 
for all experiments. Note that in this case the ellipse 
describing the object may not be upright, and we 
use the ellipse filling algorithm for efficient histogram 
computation. 

Computations of Signatures and EMD In accordance 
with 



48|, the ground distance duv is defined as 



du 



i_e-/^iih:-h.ii ^ith p = \\[Ci ... Ck] 



where C/e denotes the standard deviation of compo- 
nent /c, learned from all reference feature vectors. 
duv saturates to 1 for a large distance between two 
clusters, avoiding the side effect that few large dis- 
tances brings on the overall distance. The efficient 
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22 
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24 end 



Provided the reference image of the object, 
compute its dense SIFT features; make tensor 
decomposition using ALS algorithm in section [ 
achieving the reference tensor-SIFT image and the 
tensor basis matrices R, S and T; 
Produce the reference signature (h*,pn) for 
u = 1, - • • ,U with tensor-SIFT features; 
Initiahze manuaUy the contour in the first frame 

^o(^ = 1); 

for t = 2, 3, . . . do 

For the image D(t, z), compute its SIFT 
features and then the corresponding 
Tensor-SIFT image I(t, z) by projection 
from (|3}; 

Construct the initial level set function (f)Jj^{t) 
from the tracking result — 1), using a 
scheme of inter-frame initialization (described 
in the text); 
r = 0; 

while Sloping criterion is not true do 

Produce the candidate signature {Yiv^Qv) 
for V = 1, - • • ,V within 

nr{t) = {{i,jMj{t)<o}; 

Calculate the EMD between the reference 
and candidate signatures by the simplex 
algorithm in section [4.21 
Extract a narrow band Snarr in the 
neighboring region around the zero level set 

for each grid point {i,j) G Snarr do 
Compute its force F[j] 
Compute its curvature j 

end 

Calculate the discrete time interval Ar 
from (l3ll) : 

for each grid point {i,j) G S do 

Update the level set function (l)ij{r) 
from (l33l) 

end 

r = r + 1; 

Re-initialize the level set function every 50 
iterations using the fast marching 
algorithm; 

end 

Assign tracking result f2* (t) 



: Qr(t). 



Algorithm 1: Tracking algorithm 



K-D tree-based clustering method [48| is employed 
for color space partition. 

Selection of a in PDE The term of the curve length 
functional leads to a curvature term in PDE (|3Q]) , 
which makes the contour to have a tendency to shrink 
and actually imposes a smoothness constraint on the 
contour. The value of a reflects the effect of this 
term and is empirically set to 0.0002 in all the ex- 
periments. 



5.3 Computational Cost Analysis 

In the preceding tracking algorithm, the procedures 
that dominate the computational cost are (a) com- 
putations of SIFT features and Tensor-SIFT image; 
(b) inter-frame initialization; and (c) functional op- 
timization. 

As dense SIFT features are required, we perform 
the computation for each grid point in the image. 
With the SIFT features at hand, the computation of 
the Tensor-SIFT image is accomplished by matrix 
multiplication, because as shown in (jSj), the Moore- 
Penrose pseudo-inverse can be obtained beforehand 
once the tensor basis matrices are produced. Thus, 
the cost of this procedure may be written in the form 
Ctens = Nsift{csift + Qen.dec), whcrc Ngift is the 
number of grid points, Cgift is the cost for computing 
one SIFT feature, and Cten.dec = I'l^Kcops denotes 
the cost for computing Tensor-SIFT by projection, 
with Cops indicting the cost of long operation (multi- 
plication or division). In our experiments, the com- 
putations of SIFT features and Tensor-SIFT images 
are accomplished off-line. Note that these computa- 
tions are well-suited for parallel implementation [ssj . 

The inter-frame initialization mainly involves el- 
lipse fitting, mean shift algorithm, and the initial 
level set function construction. The ellipse fitting 
procedure is principally concerned with the general- 
ized symmetric eigenvalue problem [l^ , whose com- 
putational cost is approximately multiples of long 
operations [40, Chap. 15], where n is the matrix size. 
In this problem n = 6, thus, its computational cost is 
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small compared with the other procedures and can 
thus be discarded. The cost of the mean shift al- 
gorithm can be represented as NmsC-ms-, where Nms 
denotes the mean shift iteration number, and Cms 
denotes the cost in one iteration. For details, refer 
to Comaniciu et al. |9|. The cost of the initial level 
set function construction may be approximated as 
Neons ^ 8^Cop5, where A denotes the image area. 



The procedure of functional optimization mainly 
involves two parts: computing the EMD and the level 
set function update. The EMD computation consists 
in the simplex algorithm, whose computational com- 
plexity is theoretically exponential in problem size 
m. However, in cases where m is not large, the it- 
eration number N^rnd for the simplex algorithm to 
converge from a feasible solution to the best one may 
be a small multiple of m: TVemd = 2m~3m 



30|, chap. 



5]. Thus the cost of computing the EMD may be 



written as Ce 



-N emdi^^pivo H~ ^gaus^-) where Cp 



denotes the cost to obtain the pivot element, and 
Cgaus denotes the cost for performing the Gaussian 
elimination for the linear system of constraint equa- 
tions. 



The procedure of the level set function update 
mainly involves the evaluation of force F and cur- 
vature n. As we re-initialize the level set function 
by fast marching algorithm only once every 50 itera- 
tions, the cost induced by this process is thus omit- 
ted. Denote by c force and the cost for the force 
and curvature at one grid point. The cost for the level 
set update is represented by Cpde = \Snarr\{c force + 
c^:), where \Snarr\ dcuotcs the grid point number in 
the narrow band Snarr- The dominant operation in 
Cpde is the computation of the force F. From ([28]) . 
the cost for evaluation of F has the form c force = 
|l]|c/, where \Vt\ denotes the number of grid points in 
the candidate region Vt^ Cf denotes the cost involved 
by one grid point, mainly including the cost of dozens 
of multiplications and one exponential function eval- 
uation. 



In summary, the overall computational cost Ctota 
of the tracking algorithm may be described as 

Ctota ~ Cinter H~ Cteris H~ Coptm 

Cinter H~ Ct^ris H~ ■NoptmiCernd H~ ^pde) 
-Nms^ms H~ ^ACops 

+ Nsift{csift + l2^Kcops) 

H~ -N optmi^-N emdiS-'pivo H~ ^gaus^ 

+ |^narr|(|^|c/+C^)) (35) 

Here, Coptm denotes the cost of the functional op- 
timization, and Noptm denotes the number of itera- 
tions involved. Noptm is application dependent and 
is averaged by several hundreds in our experiments. 

6 Experiments and Discussion 

We assess the performance of three trackers. Color _EMD, 
PCA_SIFT_EMD, and Tensor _SIFT_EMD, which are 
all based on EMD but use varying features, as their 
respective names indicate. We also compare our track- 
ers with the tracker based on the Bhattacharyya co- 
efficient that use color histogram [16] (henceforth 
Color_Bhattacharyya for short). Note that an im- 
proved algorithm [57| is presented that exploits the 
background information to enhance the performance 
of Color .Bhattacharyya; we do not compare this im- 
proved one with ours because in the current work we 
do not consider any background information. 

For quantitative analysis, we use the the follow- 
ing error metric Q called overlapping region error to 
compare tracking accuracy 



E = 1 



(36) 



where | • | denotes the cardinal number of a set and 
Sresu 7 Sgrou represent the image regions of tracking 
result and the ground truth, both expressed as sets 
of image points. The error E equals if the two 
regions are identical, increases when the overlapping 
region becomes small, and equals 1 if the two do 
not overlap at all. In our experiments, we declare 
the tracking failure if the error E of more than five 
consecutive frames are larger than 0.8. 
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6.1 Qualitative and Quantitative Comparisons 

The first video scenario is a very big fish tank with 
small cobblestones underneath. In the video clip, 
some fishes with pebbles on their bodies swim freely. 
The video clip is recorded with a hand-held camera 
with 430 frames (size: 352x288 pixels). The hand- 
held camera continuously moves; hence, the viewing 
angles change from time to time, causing illumina- 
tion variation and background motion. 

For comparison purpose, we vectorize the SIFT 
features as commonly done in the literature [ll|, |25, 
27H29|, ba, and exploit the PCA for dimension- 



ality reduction (henceforth PCA-SIFT). Here, we 
visually compare PCA-SIFT (with three dominant 
principal components) and Tensor-SIFT (with three 
component tensors). Given the reference image [the 
left-most image in Fig. |2(a)| of the fish, we compute 
its SIFT features and construct the corresponding 
PCA basis and tensor basis. As shown in Fig [3l 
for the new images available (first column), after 
computing their SIFT features, the corresponding 
features of reduced dimensionality can be achieved 
through the projection by PCA basis (second col- 
umn) and the tensor basis matrices (third column). 
Although we know where the fish is, distinguishing 
it from the background is difficult in the PCA-SIFT 
images because they are so similar, whereas in the 
Tensor_SIFT images, the fish is distinct from the 
background. This is due to the PCA losing the spa- 
tial layout inherently present in the data, resulting 
in similar appearances of the foreground and back- 
ground. 

Fig. S] illustrates in each frame the overlapping 
region errors of the four trackers. Before the fish 
swims into the background of cobblestones in frames 
1-50, PCA_SIFT_EMD (yellow dotted line with "A" 
marker) is much better than Color _EMD (blue dashed 
line with "x" marker). However, afterwards, the track- 
ing errors of PCA_SIFT_EMD (brown dotted line 
with "A" marker) increase sharply and become larger 
than those of Color _EMD. This is not surprising, 
because at this time, the object is nearly indistin- 



guishable from the cobblestone background in the 
PCA_SIFT images. The two trackers diverge suc- 
cessively at about frames 160 and 180, respectively. 
Color_Bhattacharyya (green dash-dotted line with 
"V" marker) shrinks to some very small regions on 
the fish in frame 45 and never recovers. Tensor _SIFT 
_EMD (red solid line with "□" marker) demonstrates 
the best performance: it is stable and accurate in fol- 
lowing the fish throughout the sequence. Fig.[5]shows 
the typical tracking results of the four trackers. 

The second image sequence, a car sequence, is 
300 frames long (size: 352x288 pixels); it involves a 
white minivan (object) on a busy road. This image 
sequence is also captured by a hand-held camera. At 
the beginning the minivan is in the shadow of the 
skyscrapers. It turns right, and near the zebra cross- 
ing, gradually goes out of the shadow and runs to 
the sunshine at about frame 60, giving rise to sig- 
nificant illumination variation. In frames 115 to 160, 
due to a dark blue minivan coming from the oppo- 
site direction, the object is occluded little by little 
until there is almost complete occlusion; gradually, 
the object becomes un-occluded once again. After- 
wards, two similar situations of occlusions, which are 
not as severe as the first one, occur consecutively in 
frames 180-210 and 220-245 due to a red car and 
then a silver gray car passing by. Finally, the object 
leaves the view and disappears at about frame 285. 

Fig. [6] shows the overlapping region errors of the 
four trackers in each frame of the car sequence. In 
this sequence, PCA_SIFT_EMD is better than Color 
_EMD in terms of both tracking accuracy and ro- 
bustness. As of their respective tracking failures, the 
average error (meaniSTD) of PCA_SIFT_EMD is 
0.3845± 0.1124, whereas that of Color_EMD is 0.4224 
±0.1494. The former fails at about frame 170, whereas 
the latter fails at about frame 185. The tracking re- 
sult of Color. Bhatacharyya soon shrinks to some 
very small regions around frame 95 after the car runs 
into the sunshine from the skyscraper's shadow. Ten- 
sor_SIFT _EMD exhibits the best performance, fol- 
lowing the object until its disappearance; although in 
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(a) Frame 95 



(b) Frame 190 




(c) Frame 235 

Fig. 3 Visual comparison of PCA_SIFT and Tensor _SIFT (the fish is the object). The reference image as shown in 
Fig. |2(a)| is used to construct the PCA basis (three dominant principe components) and Tensor basis matrices (three 
component tensor). For the new images (left column), features of reduced dimensionality, PCA_SIFT (middle column) 
and Tensor _SIFT (right column), can be obtained through projection by the PCA and by the tensor decomposition, 
respectively. Although we know where the fish is, distinguishing it from the background is difficult in the PCA-SIFT 
images because they are so similar, whereas in the Tensor_SIFT image, the fish is distinct from the background. 

the end tracking errors become large. Typical track- can follow the object throughout the whole image 



ing results of the four trackers are shown in Fig. [9l 
As discussed in 



28, 



29| , in the SIFT feature com- 



putation, color images are first transformed to gray- 
level images, where the SIFT features are extracted. 
Therefore, our Tensor- SIFT is well suited for both 
color and single-channel images. The third sequence 
(460 frames, image size: 253x288 pixels), recorded in 
a very crowded street, contains such single- channel 
(gray- level) images. In this sequence, neither Color 
_Bhattacharyya nor Color _EMD works, as in this 
case, the histograms contain inadequate information 
to distinguish the object from the background. In 
contrast, the two trackers based on SIFT features 



sequence. Fig. [8] shows the overlapping region errors 
in each frame using Tensor_SIFT_EMD (red solid 
line with marker) and PCA_SIFT_EMD (blue 
dashed line with "V" marker), respectively. Clearly, 
the errors of the former are generally less than those 
of the latter. The average errors are 0.3454 ±0.0752 
for Tensor_SIFT_EMD and 0.3692±0.0772 for PCA_SIFT_EMD. 



6.2 Discussion 

The trackers that adopt the same distance measure 
of the EMD but depend on different features demon- 
strate varying performance. PCA_SIFT_EMD has bet- 
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ter performance in the car sequence than Color _EMD 
in both tracking accuracy and robustness. However, 
in the fish sequence, the latter outperforms the for- 
mer because in the PCA_SIFT images, the object 
(fish with pebbles on the body) and the background 
(cobble stones) have a similar appearance and are 
difficult to distinguish. In contrast, in the correspond- 
ing Tensor _SIFT images, the object is distinct from 
the background. Invariably, in both sequences, the 
tracker Tensor_SIFT_EMD shows the best perfor- 
mance, despite the challenging conditions of the clut- 
tered background, illumination variation, and occlu- 
sion. The comparisons show that Tensor-SIFT is very 
distinctive and robust to challenging conditions. 



Comparison between the two trackers, which use 
different distance (or similarity) measures of EMD 
and Bhattacharyya coefficient but the same color 
features, shows that Color _EMD is more robust to 
challenging conditions than Color_Bhatacharyya. In 
the fish and car sequences. Color _Bhatacharyya soon 
converges to very small regions on the object when 
illumination changes. Note that this phenomenon is 
also observed in the mean shift tracking that uses 
Bhattacharrya measure and color Q. The contrast 
shows that EMD is more robust than the Bhattachar- 
rya coefficient and is thus more competent in track- 
ing applications in adverse environments. 



Tensor-SIFT is suitable for both color and gray- 
level images because SIFT features are actually ex- 
tracted in single-channel images [28|, |29|. This is a 
good property because it bridges the gap between 
the color and gray-level images for tracking applica- 
tions. Most color-based trackers fail in single-channel 
images, as gray- level information only does not suf- 
fice in distinguishing the foreground and background. 
The experiment in this sequence shows that the two 
trackers based on Tensor-SIFT and PCA-SIFT both 
works well. In addition, Tensor-SIFT is superior to 
PCA-SIFT in tracking accuracy. 



7 Conclusion 

This paper presents a robust method for contour 
tracking in adverse environments. We verified the 
performance of the proposed method in image se- 
quences containing the cluttered background, illu- 
mination variation, and occlusion et al. Experiments 
show that the proposed method has promising per- 
formance. In summary, our primary contributions 
are as follows: formulation of an EMD-based func- 
tional and a two-phase method for its optimization; 
design of Tensor-SIFT features; and development of 
an integrated contour tracking algorithm. 

First, we formulate a region functional based on 
EMD with kernel density for distribution modeling 
and propose a two-phase method for the functional 
optimization. In the first phase, computing the EMD 
is modeled as a transportation problem that is solved 
by the simplex algorithm. Next, using the theory of 
shape derivative, we perform a perturbation analy- 
sis around the best solution to the transportation 
problem, producing a PDE that governs shape evo- 
lution along the gradient descent direction of the re- 
gion functional. The application of the Wasserstein 
distance in the one-dimensional case has been ex- 
plored in active contour framework, but its applica- 
tion in high-dimensional cases is difficult because its 
closed- form does not exist [ssl. To our knowledge, 
this work is one of the first attempts to apply the 
high-dimensional Wasserstein distance (EMD) in the 
active contour framework. 

Second, we design a novel feature called Tensor- 
SIFT for tracking applications. We present a ten- 
sor decomposition method for dimensionality reduc- 
tion of the well-known SIFT features. This SIFT- 
As-Tensor method can capture multiple factors in- 
herently present in the data, i.e., two-dimensional 
spatial layout and phase histogram of the gradient 
magnitude. In contrast, the traditional method of 
SIFT- As- Vector, e.g., PC A, can only capture one- 
factor information of the histogram. Tensor-SIFT, 
which is applicable to both color and gray-level im- 
ages, is very distinctive and insensitive to illumina- 
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tion change and noise. These good properties plus 
its low dimensionality enables it to be feasible for 
potential applications in some other fields such as 
image classification and image retrieval. 

Third, we develop an integrated algorithm for ro- 
bust contour tracking tasks. The algorithm employs 
various techniques for tracking applications, e.g., the 
simplex algorithm, and narrow-band level set and 
fast marching algorithms. Particularly, we present a 
scheme that employs the mean shift algorithm for 
inter-frame contour initialization. It can provide a 
more accurate initial contour and thus helps PDE 
avoid possible local minimum during its gradient- 
descent process. We also introduce a realistic stop- 
ping criterion for PDE iteration. The stopping crite- 
rion is important but is no discussed broadly in the 
literature. Our method is simple yet effective, and is 
fit for automatic termination in PDE iteration. 

The main problem of the proposed method is its 
high computational cost, as analyzed in section [Ql 
To improve the computational efficiency, in future 
works, we will use the Graphics Processing Unit (GPU) 
to accomplish computational expensive procedures, 
including Tensor-SIFT computation, EMD calcula- 
tion and PDE solution. We are also interested in fur- 
ther enhancing the tracker's performance, by com- 
bining multiple image information such as background 
characteristics or optical flow. 
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Fig. 4 Overlapping region errors of the four trackers in the fish sequence. 
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(a) Results using PCA_SIFT_EMD in the PCA_SIFT images. Tracking fails at about frame 160; from left to right are 
frames 15, 60, 90 and 150. 




(b) Results using Color_EMD in the original color images. Tracking fails at about frame 180; from left to right are frames 
15, 60, 95 and 160. 



(c) Results using Tensor_SIFT_EMD in the Tensor_SIFT images. The tracker successfully follows the object throughout 
the sequence; from left-top to right-bottom are frames 15, 60, 95, 160, 220, 270, 300 and 375. 




(d) Results using Color _Bhatacharyya in the original color images. Tracking fails at about frame 45; from left to right 
are frames 1, 20, 40 and 55. 



Fig. 5 Typical tracking results in the fish sequence using the four trackers. 
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Fig. 6 Overlapping region errors of four trackers in the car sequence. 
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(a) Results using PCA_SIFT_EMD in the PCA_SIFT images. Tracking fails at about frame 170; from left to right are 
frames 20, 70, 120 and 165. 




(b) Results using Color_EMD in the original color images. Tracking fails at about frame 185; from left to right are frames 
20, 70, 120 and 165. 
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(c) Results using Tensor_SIFT_EMD in the Tensor_SIFT images. The tracker manages to follow the object until its 
disappearance; from left-top to right-bottom are frames 20, 70, 120, 165, 185, 205, 240 and 260. The small images 
imposed in the second row show the corresponding color scenes. 




(d) Results using Color _Bhattacharyya in the original color images. Tracking fails at about frame 95; from left to right 
are frames 20, 45, 70 and 95. 



Fig. 7 Typical tracking results in the car sequence using the four trackers. 
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Fig. 8 Overlapping region errors of Tensor_SIFT_EMD and PCA_SIFT_EMD in the gray-level pedestrian sequence. 
Their average errors (meaniSTD) are 0.3454±0.0752 and 0.3692±0.0772, respectively. 
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(a) Results using PCA_SIFT_EMD in the PCA_SIFT images. From left to right are frames 40, 150, 290 and 450. 
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(a) Results using Tensor_SIFT_EMD in the Tensor_SIFT images. From left to right are frames 40, 150, 290 and 450. 




(b) Results of Tensor_SIFT_EMD (brown solid lines) and PCA_SIFT_EMD (cyan solid lines) drawn in the original 
gray-level images. From left to right are frames 40, 150, 290 and 450. 

Fig. 9 Typical tracking results in the single-channel gray-level pedestrian sequence. 



